23 research outputs found

    Machine Learning in Enzyme Engineering

    Get PDF
    Enzyme engineering plays a central role in developing efficient biocatalysts for biotechnology, biomedicine, and life sciences. Apart from classical rational design and directed evolution approaches, machine learning methods have been increasingly applied to find patterns in data that help predict protein structures, improve enzyme stability, solubility, and function, predict substrate specificity, and guide rational protein design. In this Perspective, we analyze the state of the art in databases and methods used for training and validating predictors in enzyme engineering. We discuss current limitations and challenges which the community is facing and recent advancements in experimental and theoretical methods that have the potential to address those challenges. We also present our view on possible future directions for developing the applications to the design of efficient biocatalysts

    Predicting protein stability and solubility changes upon mutations: data perspective

    Get PDF
    Understanding mutational effects on protein stability and solubility is of particular importance for creating industrially relevant biocatalysts, resolving mechanisms of many human diseases, and producing efficient biopharmaceuticals, to name a few. Forin silicopredictions, the complexity of the underlying processes and increasing computational capabilities favor the use of machine learning. However, this approach requires sufficient training data of reasonable quality for making precise predictions. This minireview aims to summarize and scrutinize available mutational datasets commonly used for training predictors. We analyze their structure and discuss the possible directions of improvement in terms of data size, quality, and availability. We also present perspectives on the development of mutational data for accelerating the design of efficient predictors, introducing two new manually curated databases FireProt(DB)and SoluProtMut(DB)for protein stability and solubility, respectively

    FireProt(DB): database of manually curated protein stability data

    Get PDF
    The majority of naturally occurring proteins have evolved to function under mild conditions inside the living organisms. One of the critical obstacles for the use of proteins in biotechnological applications is their insufficient stability at elevated temperatures or in the presence of salts. Since experimental screening for stabilizing mutations is typically laborious and expensive, in silico predictors are often used for narrowing down the mutational landscape. The recent advances in machine learning and artificial intelligence further facilitate the development of such computational tools. However, the accuracy of these predictors strongly depends on the quality and amount of data used for training and testing, which have often been reported as the current bottleneck of the approach. To address this problem, we present a novel database of experimental thermostability data for single-point mutants FireProt(DB). The database combines the published datasets, data extracted manually from the recent literature, and the data collected in our laboratory. Its user interface is designed to facilitate both types of the expected use: (i) the interactive explorations of individual entries on the level of a protein or mutation and (ii) the construction of highly customized and machine learning-friendly datasets using advanced searching and filtering. The database is freely available at https://loschmidt.chemi.muni.cz/fireprotdb

    PRIMAL-DUAL BLOCK-PROXIMAL SPLITTING FOR A CLASS OF NON-CONVEX PROBLEMS

    Get PDF
    We develop block structure-adapted primal-dual algorithms for non-convex non-smooth optimisation problems, whose objectives can be written as compositions G(x) + F(K(x)) of non-smooth block-separable convex functions G and F with a nonlinear Lipschitz-differentiable operator K. Our methods are refinements of the nonlinear primal-dual proximal splitting method for such problems without the block structure, which itself is based on the primal-dual proximal splitting method of Chambolle and Pock for convex problems. We propose individual step length parameters and acceleration rules for each of the primal and dual blocks of the problem. This allows them to convergence faster by adapting to the structure of the problem. For the squared distance of the iterates to a critical point, we show local O(1/N), O(1/N-2), and linear rates under varying conditions and choices of the step length parameters. Finally, we demonstrate the performance of the methods for the practical inverse problems of diffusion tensor imaging and electrical impedance tomography.Peer reviewe

    Sensitive operation of enzyme-based biodevices by advanced signal processing.

    No full text
    Analytical devices that combine sensitive biological component with a physicochemical detector hold a great potential for various applications, e.g., environmental monitoring, food analysis or medical diagnostics. Continuous efforts to develop inexpensive sensitive biodevices for detecting target substances typically focus on the design of biorecognition elements and their physical implementation, while the methods for processing signals generated by such devices have received far less attention. Here, we present fundamental considerations related to signal processing in biosensor design and investigate how undemanding signal treatment facilitates calibration and operation of enzyme-based biodevices. Our signal treatment approach was thoroughly validated with two model systems: (i) a biodevice for detecting chemical warfare agents and environmental pollutants based on the activity of haloalkane dehalogenase, with the sensitive range for bis(2-chloroethyl) ether of 0.01-0.8 mM and (ii) a biodevice for detecting hazardous pesticides based on the activity of γ-hexachlorocyclohexane dehydrochlorinase with the sensitive range for γ-hexachlorocyclohexane of 0.01-0.3 mM. We demonstrate that the advanced signal processing based on curve fitting enables precise quantification of parameters important for sensitive operation of enzyme-based biodevices, including: (i) automated exclusion of signal regions with substantial noise, (ii) derivation of calibration curves with significantly reduced error, (iii) shortening of the detection time, and (iv) reliable extrapolation of the signal to the initial conditions. The presented simple signal curve fitting supports rational design of optimal system setup by explicit and flexible quantification of its properties and will find a broad use in the development of sensitive and robust biodevices

    Exploration of Protein Unfolding by Modelling Calorimetry Data from Reheating

    No full text
    Abstract Studies of protein unfolding mechanisms are critical for understanding protein functions inside cells, de novo protein design as well as defining the role of protein misfolding in neurodegenerative disorders. Calorimetry has proven indispensable in this regard for recording full energetic profiles of protein unfolding and permitting data fitting based on unfolding pathway models. While both kinetic and thermodynamic protein stability are analysed by varying scan rates and reheating, the latter is rarely used in curve-fitting, leading to a significant loss of information from experiments. To extract this information, we propose fitting both first and second scans simultaneously. Four most common single-peak transition models are considered: (i) fully reversible, (ii) fully irreversible, (iii) partially reversible transitions, and (iv) general three-state models. The method is validated using calorimetry data for chicken egg lysozyme, mutated Protein A, three wild-types of haloalkane dehalogenases, and a mutant stabilized by protein engineering. We show that modelling of reheating increases the precision of determination of unfolding mechanisms, free energies, temperatures, and heat capacity differences. Moreover, this modelling indicates whether alternative refolding pathways might occur upon cooling. The Matlab-based data fitting software tool and its user guide are provided as a supplement
    corecore